Basra
Hotel in Iraqi capital Baghdad struck as attacks on US embassy intercepted
Could Iran be using China's BeiDou system? Drone strike hits Al-Rasheed hotel in Baghdad's Green Zone near US embassy, no casualties reported A prominent hotel in central Baghdad's heavily fortified Green Zone was struck by a drone, amid reports that Iraqi air defences intercepted an attack over the United States Embassy. The strike on Monday evening hit the top floor of Al-Rasheed Hotel, causing damage but no casualties, according to two Iraqi security officials cited by The Associated Press (AP) news agency. Security sources told the Reuters news agency that two Katyusha rockets had been intercepted that evening near the US Embassy in the Green Zone, which houses diplomatic missions as well as international institutions and government offices. Earlier Monday, the Iran-backed Kataib Hezbollah announced that Abu Ali Al-Askari, a prominent security official with the paramilitary group, had been killed, without giving details on the circumstances.
- North America > United States (1.00)
- Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.86)
- Asia > Middle East > Iran (0.67)
- (12 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
- Government > Foreign Policy (1.00)
- Asia > Middle East > Iran (1.00)
- Asia > Middle East > Iraq > Kurdistan Region (0.17)
- Asia > North Korea (0.14)
- (19 more...)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Regional Government > Asia Government > Middle East Government > Iraq Government (0.70)
- Information Technology > Communications > Social Media (0.98)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)
Language Model Tokenizers Introduce Unfairness Between Languages
Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.
- North America > Haiti (0.14)
- Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (38 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO
Blagoev, Nikolay, Ersoy, Oğuzhan, Chen, Lydia Yiyu
Group Relative Policy Optimization (GRPO) has demonstrated great utilization in post-training of Large Language Models (LLMs). In GRPO, prompts are answered by the model and, through reinforcement learning, preferred completions are learnt. Owing to the small communication volume, GRPO is inherently suitable for decentralised training as the prompts can be concurrently answered by multiple nodes and then exchanged in the forms of strings. In this work, we present the first adversarial attack in decentralised GRPO. We demonstrate that malicious parties can poison such systems by injecting arbitrary malicious tokens in benign models in both out-of-context and in-context attacks. Using empirical examples of math and coding tasks, we show that adversarial attacks can easily poison the benign nodes, polluting their local LLM post-training, achieving attack success rates up to 100% in as few as 50 iterations. We propose two ways to defend against these attacks, depending on whether all users train the same model or different models. We show that these defenses can achieve stop rates of up to 100%, making the attack impossible.
- Europe > Austria > Vienna (0.14)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (5 more...)
DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models
Altakrori, Malik H., Habash, Nizar, Freihat, Abdelhakim, Samih, Younes, Chirkunov, Kirill, AbuOdeh, Muhammed, Florian, Radu, Lynn, Teresa, Nakov, Preslav, Aji, Alham Fikri
We present DialectalArabicMMLU, a new benchmark for evaluating the performance of large language models (LLMs) across Arabic dialects. While recently developed Arabic and multilingual benchmarks have advanced LLM evaluation for Modern Standard Arabic (MSA), dialectal varieties remain underrepresented despite their prevalence in everyday communication. DialectalArabicMMLU extends the MMLU-Redux framework through manual translation and adaptation of 3K multiple-choice question-answer pairs into five major dialects (Syrian, Egyptian, Emirati, Saudi, and Moroccan), yielding a total of 15K QA pairs across 32 academic and professional domains (22K QA pairs when also including English and MSA). The benchmark enables systematic assessment of LLM reasoning and comprehension beyond MSA, supporting both task-based and linguistic analysis. We evaluate 19 open-weight Arabic and multilingual LLMs (1B-13B parameters) and report substantial performance variation across dialects, revealing persistent gaps in dialectal generalization. DialectalArabicMMLU provides the first unified, human-curated resource for measuring dialectal understanding in Arabic, thus promoting more inclusive evaluation and future model development.
- Asia > Middle East > Qatar (0.28)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Middle East > Saudi Arabia (0.14)
- (25 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math
Pang, Bo, Kong, Deqian, Savarese, Silvio, Xiong, Caiming, Zhou, Yingbo
Reinforcement learning (RL) can elicit strong reasoning in large language models (LLMs), yet most open efforts focus on math and code. We propose Reasoning Curriculum, a simple two-stage curriculum that first elicits reasoning skills in pretraining-aligned domains such as math, then adapts and refines these skills across other domains via joint RL. Stage 1 performs a brief cold start and then math-only RL with verifiable rewards to develop reasoning skills. Stage 2 runs joint RL on mixed-domain data to transfer and consolidate these skills. The curriculum is minimal and backbone-agnostic, requiring no specialized reward models beyond standard verifiability checks. Evaluated on Qwen3-4B and Llama-3.1-8B over a multi-domain suite, reasoning curriculum yields consistent gains. Ablations and a cognitive-skill analysis indicate that both stages are necessary and that math-first elicitation increases cognitive behaviors important for solving complex problems. Reasoning Curriculum provides a compact, easy-to-adopt recipe for general reasoning.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.04)
- Asia > Middle East > Iraq > Basra Governorate > Basra (0.04)
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
Gao, Jiaxuan, Fu, Wei, Xie, Minyang, Xu, Shusheng, He, Chuyi, Mei, Zhiyu, Zhu, Banghua, Wu, Yi
Recent advancements in LLM-based agents have demonstrated remarkable capabilities in handling complex, knowledge-intensive tasks by integrating external tools. Among diverse choices of tools, search tools play a pivotal role in accessing vast external knowledge. However, open-source agents still fall short of achieving expert-level Search Intelligence, the ability to resolve ambiguous queries, generate precise searches, analyze results, and conduct thorough exploration. Existing approaches fall short in scalability, efficiency, and data quality. For example, small turn limits in existing online RL methods, e.g. <=10, restrict complex strategy learning. This paper introduces ASearcher, an open-source project for large-scale RL training of search agents. Our key contributions include: (1) Scalable fully asynchronous RL training that enables long-horizon search while maintaining high training efficiency. (2) A prompt-based LLM agent that autonomously synthesizes high-quality and challenging QAs, creating a large-scale QA dataset. Through RL training, our prompt-based QwQ-32B agent achieves substantial improvements, with 78.0% and 34.3% Avg@4 gains on xBench and GAIA, respectively. Notably, our agent exhibits extreme long-horizon search, with tool calls exceeding 100 turns and output tokens exceeding 400k during training time. With a simple agent design and no external LLMs, ASearcher-Web-QwQ achieves Avg@4 scores of 51.1 on xBench and 58.7 on GAIA, surpassing existing open-source 32B agents. Finally, we also show that ASearcher-Web-QwQ could achieve performance of commercial systems using external summary tool in a zero-shot transfer manner and test-time search. We open-source our models, training data, and codes in https://github.com/inclusionAI/ASearcher.
- North America > United States > Arkansas > Pope County > Russellville (0.04)
- Asia > China (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > Middle East > Iraq > Basra Governorate > Basra (0.04)
Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI
Sang, Jitao, Xiao, Jinlin, Han, Jiarun, Chen, Jilin, Chen, Xiaoyi, Wei, Shuyu, Sun, Yongjie, Wang, Yuhang
The rapid evolution of agentic AI marks a new phase in artificial intelligence, where Large Language Models (LLMs) no longer merely respond but act, reason, and adapt. This survey traces the paradigm shift in building agentic AI: from Pipeline-based systems, where planning, tool use, and memory are orchestrated by external logic, to the emerging Model-native paradigm, where these capabilities are internalized within the model's parameters. We first position Reinforcement Learning (RL) as the algorithmic engine enabling this paradigm shift. By reframing learning from imitating static data to outcome-driven exploration, RL underpins a unified solution of LLM + RL + Task across language, vision and embodied domains. Building on this, the survey systematically reviews how each capability -- Planning, Tool use, and Memory -- has evolved from externally scripted modules to end-to-end learned behaviors. Furthermore, it examines how this paradigm shift has reshaped major agent applications, specifically the Deep Research agent emphasizing long-horizon reasoning and the GUI agent emphasizing embodied interaction. We conclude by discussing the continued internalization of agentic capabilities like Multi-agent collaboration and Reflection, alongside the evolving roles of the system and model layers in future agentic AI. Together, these developments outline a coherent trajectory toward model-native agentic AI as an integrated learning and interaction framework, marking the transition from constructing systems that apply intelligence to developing models that grow intelligence through experience.
- Workflow (1.00)
- Overview (1.00)
- Research Report > New Finding (0.45)
- Instructional Material > Course Syllabus & Notes (0.45)
- Education (1.00)
- Health & Medicine (0.92)
- Leisure & Entertainment > Games (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- (5 more...)
CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs
Wang, Shaobo, Miao, Yongliang, Liu, Yuancheng, Ma, Qianli, Liao, Ning, Zhang, Linfeng
Large language models (LLMs) have demonstrated impressive reasoning capabilities, but scaling their performance often relies on massive reasoning datasets that are computationally expensive to train on. Existing data selection methods aim to curate smaller, high-quality subsets but often rely on costly external models or opaque heuristics. In this work, we shift the focus from external heuristics to the model's internal mechanisms. We find that complex reasoning tasks consistently activate a sparse, specialized subset of attention heads, forming core reasoning circuits. Building on this insight, we propose CircuitSeer, a novel data selection method that quantifies the reasoning complexity of data by measuring its influence on these crucial circuits. Extensive experiments on 4 models and 9 datasets demonstrate CircuitSeer's superiority. Notably, fine-tuning Qwen2.5-Math-7B on just 10% of data selected by our method achieves a 1.4-point gain in average Pass@1 over training on the full dataset, highlighting its efficiency and effectiveness.
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (2 more...)
- Automobiles & Trucks (0.68)
- Transportation > Ground > Road (0.47)
- Transportation > Electric Vehicle (0.47)
HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission
Yang, Weihao, Huang, Hao, Wu, Donglei, Li, Ningke, Pan, Yanqi, Zheng, Qiyang, Xia, Wen, Li, Shiyi, Wang, Qiang
Mixture-of-Experts (MoE) has become a popular architecture for scaling large models. However, the rapidly growing scale outpaces model training on a single DC, driving a shift toward a more flexible, cross-DC training paradigm. Under this, Expert Parallelism (EP) of MoE faces significant scalability issues due to the limited cross-DC bandwidth. Specifically, existing EP optimizations attempt to overlap data communication and computation, which has little benefit in low-bandwidth scenarios due to a much longer data communication time. Therefore, the trends of cross-DC EP scaling is fast becoming a critical roadblock to the continued growth of MoE models. To address this, we propose HybridEP, a modeling-guided framework to optimize EP under constrained bandwidth. Our key idea is to dynamically transform the spatial placement of experts to reduce data communication traffic and frequency, thereby minimizing EP's communication overheads. However, it is non-trivial to find the optimal solution because it complicates the original communication pattern by mixing data and expert communication. We therefore build a stream-based model to determine the optimal transmission ratio. Guided by this, we incorporate two techniques: (1) domain-based partition to construct the mapping between hybrid patterns and specific communication topology at GPU level, and (2) parameter-efficient migration to further refine this topology by reducing expert transmission overhead and enlarging the domain size. Combining all these designs, HybridEP can be considered as a more general EP with better scalability. Experimental results show that HybridEP outperforms existing state-of-the-art MoE training systems by up to 5.6x under constrained bandwidth. We further compare HybridEP and EP on large-scale simulations. HybridEP achieves up to 1.45x speedup with 1k DCs under different bandwidths.
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > Lebanon > Keserwan-Jbeil Governorate > Blat (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (6 more...)